{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Generate a list of SMACT-allowed compositions\n",
    "This notebook provides a short demo of how to use SMACT to generate a list of element compositions that could later be used as input for machine learning or some other  screening workflow. \n",
    "We use the standard `smact_filter` as described in the [docs](https://smact.readthedocs.io/en/latest/examples.html#neutral-combinations) and outlined more fully in [this research paper](https://www.ncbi.nlm.nih.gov/pubmed/27790643).\n",
    "\n",
    "In the example below, we generate ternary oxide compositions of the first row transition metals."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "### Imports\n",
    "from smact import Element, element_dictionary, ordered_elements\n",
    "from smact.screening import smact_filter\n",
    "from datetime import datetime\n",
    "import itertools\n",
    "import multiprocessing"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We define the elements we are interested in:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## List generation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['Sc', 'Ti', 'V', 'Cr', 'Mn', 'Fe', 'Co', 'Ni', 'Cu', 'Zn']\n"
     ]
    }
   ],
   "source": [
    "all_el = element_dictionary()   # A dictionary of all element objects\n",
    "\n",
    "# Say we are just interested in first row transition metals\n",
    "els = [all_el[symbol] for symbol in ordered_elements(21,30)]\n",
    "\n",
    "# We can print the symbols \n",
    "print([i.symbol for i in els])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will investiage ternary M1-M2-O combinations exhaustively, where M1 and M2 are different transition metals."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Sc Ti O\n",
      "Sc V O\n",
      "Sc Cr O\n",
      "Sc Mn O\n",
      "Sc Fe O\n",
      "Sc Co O\n",
      "Sc Ni O\n",
      "Sc Cu O\n",
      "Sc Zn O\n",
      "Ti V O\n",
      "Ti Cr O\n",
      "Ti Mn O\n",
      "Ti Fe O\n",
      "Ti Co O\n",
      "Ti Ni O\n",
      "Ti Cu O\n",
      "Ti Zn O\n",
      "V Cr O\n",
      "V Mn O\n",
      "V Fe O\n",
      "V Co O\n",
      "V Ni O\n",
      "V Cu O\n",
      "V Zn O\n",
      "Cr Mn O\n",
      "Cr Fe O\n",
      "Cr Co O\n",
      "Cr Ni O\n",
      "Cr Cu O\n",
      "Cr Zn O\n",
      "Mn Fe O\n",
      "Mn Co O\n",
      "Mn Ni O\n",
      "Mn Cu O\n",
      "Mn Zn O\n",
      "Fe Co O\n",
      "Fe Ni O\n",
      "Fe Cu O\n",
      "Fe Zn O\n",
      "Co Ni O\n",
      "Co Cu O\n",
      "Co Zn O\n",
      "Ni Cu O\n",
      "Ni Zn O\n",
      "Cu Zn O\n"
     ]
    }
   ],
   "source": [
    "# Generate all M1-M2 combinations\n",
    "metal_pairs = itertools.combinations(els, 2)\n",
    "# Add O to each pair \n",
    "ternary_systems = [[*m, Element('O')] for m in metal_pairs]\n",
    "# Prove to ourselves that we have all unique chemical systems\n",
    "for i in ternary_systems:\n",
    "    print(i[0].symbol, i[1].symbol, i[2].symbol)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Time taken to generate list:  0:00:00.448839\n"
     ]
    }
   ],
   "source": [
    "# Use multiprocessing and smact_filter to quickly generate our list of compositions\n",
    "start = datetime.now()\n",
    "if __name__ == '__main__':   # Always use pool protected in an if statement \n",
    "    with multiprocessing.Pool(processes=4) as p:    # start 4 worker processes\n",
    "        result = p.map(smact_filter, ternary_systems)\n",
    "print('Time taken to generate list:  {0}'.format(datetime.now()-start))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of compositions: --> 16615 <--\n",
      "Each list entry looks like this:\n",
      "  elements, oxidation states, stoichiometries\n",
      "(('Sc', 'Ti', 'O'), (1, -1, -2), (3, 1, 1))\n",
      "(('Sc', 'Ti', 'O'), (1, -1, -2), (4, 2, 1))\n",
      "(('Sc', 'Ti', 'O'), (1, -1, -2), (5, 1, 2))\n",
      "(('Sc', 'Ti', 'O'), (1, -1, -2), (5, 3, 1))\n",
      "(('Sc', 'Ti', 'O'), (1, -1, -2), (6, 4, 1))\n"
     ]
    }
   ],
   "source": [
    "# Flatten the list of lists\n",
    "flat_list = [item for sublist in result for item in sublist]\n",
    "print('Number of compositions: --> {0} <--'.format(len(flat_list)))\n",
    "print('Each list entry looks like this:\\n  elements, oxidation states, stoichiometries')\n",
    "for i in flat_list[:5]:\n",
    "    print(i)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Next steps\n",
    "### Pymatgen reduced formulas\n",
    "We could turn the compositions into reduced formula using pymatgen (we lost the oxidation state information in this example)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Each list entry now looks like this: \n",
      "Sc3TiO\n",
      "Sc4Ti2O\n",
      "Sc5TiO2\n",
      "Sc5Ti3O\n",
      "Sc6Ti4O\n"
     ]
    }
   ],
   "source": [
    "from pymatgen import Composition\n",
    "def comp_maker(comp):\n",
    "    form = []\n",
    "    for el, ammt in zip(comp[0], comp[2]):\n",
    "        form.append(el)\n",
    "        form.append(ammt)\n",
    "    form = ''.join(str(e) for e in form)\n",
    "    pmg_form = Composition(form).reduced_formula\n",
    "    return pmg_form\n",
    "\n",
    "if __name__ == '__main__':  \n",
    "    with multiprocessing.Pool(processes=4) as p:\n",
    "        pretty_formulas = p.map(comp_maker, flat_list)\n",
    "\n",
    "print('Each list entry now looks like this: ')\n",
    "for i in pretty_formulas[:5]:\n",
    "    print(i)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Pandas\n",
    "Finally, we could put this into a pandas DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>pretty_formula</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>9353</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>unique</th>\n",
       "      <td>9353</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>top</th>\n",
       "      <td>Ti3V4O7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>freq</th>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       pretty_formula\n",
       "count            9353\n",
       "unique           9353\n",
       "top           Ti3V4O7\n",
       "freq                1"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "new_data = pd.DataFrame({'pretty_formula': pretty_formulas})\n",
    "# Drop any duplicate compositions\n",
    "new_data = new_data.drop_duplicates(subset = 'pretty_formula')\n",
    "new_data.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Next steps\n",
    "The dataframe can then be featurized for representation to a machine learning algorithm, for example in Scikit-learn. Below is a code snippet from a [publicly avalable example](https://github.com/WMD-group/Solar_oxides_data) to demonstrate this using [the matminer package](https://github.com/hackingmaterials/matminer):\n",
    "\n",
    "``` python\n",
    "\n",
    "from matminer.featurizers.conversions import StrToComposition\n",
    "from matminer.featurizers.base import MultipleFeaturizer\n",
    "from matminer.featurizers import composition as cf\n",
    "\n",
    "# Use featurizers from matminer to featurize data\n",
    "str_to_comp = StrToComposition(target_col_id='composition_obj')\n",
    "str_to_comp.featurize_dataframe(new_data, col_id='pretty_formula')\n",
    "\n",
    "feature_calculators = MultipleFeaturizer([cf.Stoichiometry(), \n",
    "                         cf.ElementProperty.from_preset(\"magpie\"),\n",
    "                         cf.ValenceOrbital(props=['avg']), \n",
    "                         cf.IonProperty(fast=True),\n",
    "                         cf.BandCenter(), cf.AtomicOrbitals()])\n",
    "\n",
    "feature_labels = feature_calculators.feature_labels()\n",
    "feature_calculators.featurize_dataframe(new_data, col_id='composition_obj');\n",
    "\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "___"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "_D. W. Davies - 20th Feb 2019_"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
